Geometricity of Residue Interaction Graphs
نویسندگان
چکیده
Finding a well fitting null model for biological networks is important in many research areas. A good model should generate graphs that resemble real data as closely as possible across a wide range of statistical measures. Degree-preserving randomized models have been widely used for this purpose in biomolecular networks. However, such a single summary global statistic of a network may not be detailed enough to capture the complex topological characteristics of a network. Here, we consider residue interaction graphs (RIGs) as network representations of protein structures with residues as nodes and inter-residue interactions as edges. The RIGs observed in this study are derived from a structurally diverse data set covering nine proteins. For each protein, in addition to a series of distance cut-offs, we examine three different “contact types”: we denote by “BB” (“SC”) the RIGs that contain as edges only the residue pairs that have heavy backbone (side-chain) atoms within the given distance cut-off; we denote by “ALL” the most commonly used RIG model, in which all heavy atoms of every residue are taken into account when determining residue interactions. In order to find a well-fitting network model for RIGs, we evaluate the fit of RIGs to five random graph models: Erdös-Rényi random graphs (“ER”) [1], random graphs with same degree distribution as the RIGs (“ER-DD”), 3-dimensional geometric random graphs constructed using Euclidean boxes and Euclidean distance norm (“GEO-3D”) [2], Barabási-Albert type scale free networks (“SF-BA”) [3], and stickiness-index based networks (“STICKY”) [4]. Each of the generated model networks that corresponds to a RIG has the same number of nodes and the number of edges within 1% of those in the RIG. Exact comparisons of large networks are computationally infeasible due to NP-completeness of the underlying subgraph isomorphism problem [5]. Thus, to evaluate the fit of the data to the model networks, we compare the RIGs to the model networks with respect to a set of network properties. To overcome the limitations introduced by using a single network property (such as the degree distribution), we perform a fine-grained analysis of RIGs that is based on a variety of local and global network properties. The local properties used in this study are based on graphlets, small connected non-isomorphic induced subgraphs of large networks [6]. The two local properties that we use are relative graphlet frequency distance (RGF-distance) [6] and graphlet degree distribution agreement (GDD-agreement) [7]. Additionally, we use four global network properties: the degree distribution, the clustering coefficient, the clustering spectrum, and the average network diameter. We show that 3-dimensional geometric random graphs provide the best fit to these RIGs for all reasonable and practically used distance cut-offs. All analyzed local and global network properties offer support to superiority of the GEO-3D model. Illustrations showing the fit of one of the analyzed proteins to the five network models according to GDD-agreements and RGF-distances are presented in Figure 1. For all distance cut-offs and all contact types, RGF-distances and GDD-agreements between the RIGs and the model networks strongly favor geometric random graphs. The similar trends follow for other network properties and other proteins. To summarize, we address the important issue of finding a well fitting null model for protein structure networks. We show the superiority of the fit of geometric random graphs over four other random graph models to RIGs that correspond to nine structurally different proteins and are constructed using three different contact types and a series of residue distance cutoff values. This superiority of the fit is demonstrated by examining two highly constraining measures of network local structure, as well as four standard measures of global network structure. Our geometric random graph null model may facilitate further graph-based studies of protein conformation space and the discovery of significant structural motifs. This analysis may also have important implications for protein structure comparison and prediction. References [1] Erdös, P. and Rényi, A. (1959) On random graphs. Publicationes Mathematicae, 6, 290– 297. [2] Penrose, M. (2003) Geometric Random Graphs, Oxford University Press. [3] Barabási, A.-L. and Albert, R. (1999) Emergence of scaling in random networks. Science, 286(5439), 509–512. [4] Pržulj., N. and Higham, D. (2006) Modelling proteinprotein interaction networks via a stickiness index. Journal of the Royal Society Interface, 3(10), 711–716. [5] Cook, S. (1971) In Proc. 3rd Ann. ACM Symp. on Theory of Computing Assosiation for Computing Machinery pp. 151–158. [6] Pržulj, N., Corneil, D. G., and Jurisica, I. (2004) Modeling interactome: Scale-free or geometric?. Bioinformatics, 20(18), 3508–3515. [7] Pržulj, N. (2006) Biological network comparison using graphlet degree distribution. Bioinformatics, 23, e177–e183.
منابع مشابه
Lattices associated with distance-regular graphs
Let L be a finite set associated with cliques of a distance-regular graph of order (s, t), with d-cliques of Johnson graphs and antipodal distance-regular graphs of diameter d, respectively. If we partially order L by the ordinary inclusion, three families of finite atomic lattices are obtained. This article discusses their geometricity, and computes their characteristic polynomials. c © 2007 E...
متن کاملA critical analysis of computational protein design with sparse residue interaction graphs
Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, w...
متن کاملClustering Implies Geometry in Networks.
Network models with latent geometry have been used successfully in many applications in network science and other disciplines, yet it is usually impossible to tell if a given real network is geometric, meaning if it is a typical element in an ensemble of random geometric graphs. Here we identify structural properties of networks that guarantee that random graphs having these properties are geom...
متن کاملAb Initio Quantum Chemical Studies of 15N and 13C NMR Shielding Tensors in Serine and Complexes of Serine- nH2O: Investigation on Strength of the CαH…O Hydrogen bonding in the Amino Acid Residue.
In this paper, the hydrogen bonding (HB) effects on the NMR chemical shifts of selected atoms in serineand serine-nH2O complexes (from one to ten water molecules) have been investigated with quantummechanical calculations of the 15N and 13C tensors. Interaction with water molecules causes importantchanges in geometry and electronic structure of serine.For the compound studied, the most importan...
متن کاملLattices Generated by Two Orbits of Subspaces under Finite Singular Symplectic Groups
In the paper titled “Lattices generated by two orbits of subspaces under finite classical group” byWang andGuo.The subspaces in the lattices are characterized and the geometricity is classified. In this paper, the result above is generalized to singular symplectic space. This paper characterizes the subspaces in these lattices, classifies their geometricity, and computes their characteristic po...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007